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1. Introduction. Blackwell's approachabilit y theory and its variants has become a standard and 
useful tool in analyzing online learning algorithms ( Cesa-Bianchi and Lugosi ^) and algorithms for learn- 
ing in games ( Hart and Mas-Colelll 13|, |lj|). The first application of Blackwell's approachability to learn- 
ing in the online s etup is due to Blac kwell fs*] himself. Numerous other contributions are summarized in 
the monograph bv lCesa-Bianchi and Lugosi j5j. Blackwell's approachability theory enjoys a clear geomet- 
ric interpretation that allows it to be used in situations where online convex optimization or exponential 
weights do not seem to be easily applicable and, in some sense, to go bey ond the minimi zation of the 



regret and/or to control quantities of a different flavor; e.g., in th e article by [ Mannor et al.] 20j . to mini- 
mize the regret together with path constraints, and in the one bv lMannor and ShimkinI 18 1, to minim ize 
the regret in games whose stage duration is not fixed. Recently, it has been shown bv lAbernethv et al.l [H 
that approachability and low regret learning ar e equivalent in the sense that efficient reductions exist 
from one to the other. Another recent paper bv lRakhlin et al. 27 1 showed that approachability can be 
analyzed from the perspective of learnability using tools from learning theory. 

In this paper we consider approachability and online learning with partial monitoring in games against 
Nature. In partial monitoring the decision maker does not know how much reward was obtained and only 
gets a (random) signal whose distribution depends on the action of the decision maker and the action 
of Nature. There are two extremes of this setup that are well studied. On the one extreme we have the 
case where the signal includes the reward itself (or a signal that can be used to unbiasedly estimate the 
reward), which is essentially the celebrated bandits setup. The other extreme is the case where the signal 
is not informative (i.e., it tells the decision maker nothing about the actual reward obtained); this setting 
then essentially consists of repeating the same situation over and over again, as no information is gained 
over time. We consider a setup encompassing these situations and more general ones, in which the signal 
is indicative of the actual reward, but is not necessarily a sufficient statistics thereof. The difficulty is 
that the decision maker cannot compute the actual reward he obtained nor the actions of Nature. 

Regret minimi zation with partial monitoring has been studied in severa l pa. p ers in the learning t he- 
ory community. Piccolboni and Schindelhauer 26 1, Mannor and Shimkin [l7| . ICesa-Bianchi et aL Q 
study special cases where an accurate estimation of the rewards (or worst-case rewards) of the decision 
ma ker is possibl e thanks to some extra structure. A general policy with vanishing regret is presented 
by Lugosi et al. 16| . This policy is based on exponential weights and a specific estimation procedure 
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for the (worst-case) obtained rewards. In contrast, we provide approachability-based results for the 
problem of regret minimization. On route, we define a new type of approachability setup, with enables 
to re-derive the extension of approachability to the partial monitoring vector- valued setting proposed 
bv |Perch"et ! '23*1 . More importantly, we provide concrete algorithms for this approachability problem that 
are more efficient in the sense that, unlike previous works in the domain, their complexity is constant over 
all steps. Moreover , their rates of convergence are independent of the game at hand, as in the seminal 
paper by iBlackwell ^ but for the first time in this gene ral framework. For example, th e recent purely 
theoretical (and fairly technical) study of approachabilitv iPerchet and Quincampoixl 25|, which is based 
on somehow related arguments, does neither provide rates of convergence nor concrete algorithms for this 
matter. 



Outline. The paper is organized as follows. In Section[2]we recall some basic facts from approachability 
theory in the standard vector- valued games setting where a decision maker is engaged in a repeated vector- 
valued game against an arbitrary opponent (or "Nature" ) . In Section [3] we propose a novel setup for 
approachability, termed "robust approachability," where instead of obtaining a vector-valued reward, 
the decision maker obtains a set, that represents the ambiguity concerning his reward. We provide a 
simple characterization of approachable convex sets and an algorithm for the set-valued reward setup 
under the assumption that the set- valued reward functions are linear. In Section 2] we extend the robust 
approachability setup to problems where the set- valued reward functions are not linear, but rather concave 
in the mixed action of the decision maker and convex in the mixed action of Nature. In Section [S] we 
show how to apply the robust approachability framework to the repeated vector- valued games with partial 
monitoring. In Section [5] we consider a special type of games where the signaling structure possesses a 
special property, called bi-piecewise linearity, that can be exploited to derive efficient strategies. This 
type of games is rich enough as it encompasses several useful special cases. In Section Wl] we provide a 
simple and constructive algor ithm for th eses games. Previous results for approachability in this setup 



were either non-constructive (|Rustichinil [29[) or were highly inefficie nt as they relied on some sort of 
lifting to the space of probability measures on mixed actions ([Percheti 23|) and typically required a grid 
that is progressively refined (leading to a step complexity that is exponential in the number T of past 
steps). In Section \S7I\ we apply our results for both external-regret and internal- regret minimization in 
repeated games with partial monitoring. In both cases our proofs are simple, lead to algorithms with 
constant complexity at each step, an d are acco mpanied with rates. Our results for external regret have 
rates similar to the ones obtained bv iLugosi et al.l |ld |. but our proof is direct and simpler. In Section [7] 
we mention the general signaling case and explain how it is possible to approach certain special sets such 
as polytopes efficiently and general convex sets although inefficiently. 



2. Some basic facts from approachability theory. In this section we recall the most basic 
versions of Blackwell's approachability theorem for vector-valued payoff functions. 

We consider a vector-valued game between two players, a decision maker (first player) and Nature 
(second player), with respective finite action sets A and whose cardinalities are referred to as Nji and 
Nig. We denote by d the dimension of the reward vector and equip R'^ with the ^^-norm |j • W^- The payoff 
function of the first player is given by a mapping m : A x B ^ M'', which is multi- linearly extended to 
A{A) X A(S), the set of product-distributions over Ax B. 

We consider two frameworks, depending on whether pure or mixed actions are taken. 



Pure actions taken and observed. We denote hy Ai, A2, . . . and Bi, B2, ... the actions in A and 
B sequentially taken by each player; they are possibly given by randomized strategies, i.e., the actions 
At and Bt were obtained by random draws according to respective probability distributions denoted by 
Xt S A(y^) and G A(S). For now, we assume that the first player has a full or bandit monitoring of the 
pure actions taken by the opponent player: at the end of round t, when receiving the payoff m{At, Bt), 
either the pure action Bt (full monitoring) or only the indicated payoff (bandit monitoring) is revealed 
to him. 



Definition 2.1 A set C C R*^ is m-approachable with pure actions if there exists a strategy of the first 
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player such that, for all e > 0, there exists an integer such that for all strategies of the second player, 

T 



inf 



> 1 



In particular, the first player has a strategy that ensures that the average of his vector-valued payoffs 
converges almost surely to the set C (uniformly with respect to the strategies of the second player). 

The above convergence will be achieved in the course of this paper under two forms. Most often we 
will exhibit strategies such that, for all strategies of the second player, for all S > 0, with probability at 
least 1 — (5, 

T 



inf 

c6C 



t=i 



A union bound shows that such strategies m-approach C as soon as there exists a positive sequence Et 
such that J2^t is finite and P{T,£t) — >■ 0. Sometimes we will also deal with strategies directly ensuring 
that, for all strategies of the second player, for all (5 > 0, with probability at least 1 — 5, 



sup inf 



c ^Y^m(yAt,Bt 



Such strategies m-approach C as soon as I3{T, (5) — >• for all 5 > 0. 



Mixed actions taken and observed. In this case, we denote by Xi, X2, ■ ■ ■ and t/i, 1/2, ■ ■ ■ the actions 
in A{A) and A{B) sequentially taken by each player. We also assume a full or bandit monitoring for the 
first player: at the end of round t, when receiving the payoff m{xt,yf), cither the mixed action (full 
monitoring) or the indicated payoff (bandit monitoring) is revealed to him. 

Definition 2.2 A set C C R'' is m-approachable with mixed actions if there exists a strategy of the first 
player such that, for all e > 0, there exists an integer Tg, such that for all strategies of the second player. 



inf 

cec 



1 ^ 



m[xt,yt 



As indicated below, in this setting the first player may even have deterministic strategies such that, 
for all (deterministic or randomized) strategies of the second player, 



inf 

cec 



1 ^ 



'm'[xt,yt 



with probability 1, where /3(T) 0. 



Necessary and sufficient condition for approachability. For closed convex sets there is a simple 
characterization of approachability that is a direct consequence of the minimax theorem; the condition 
is the same for the two settings, whether pure or mixed actions are taken and observed. 

Theorem 2.1 (Theorem 3 of IBlackwelJ |H) A closed convex setC C M'' is approachable (with pure 
or mixed actions) if and only if 



VyeA(6), 3xeA{A), 



m{x, y) E C . 



An associated strategy (that is efficient depending on the geometry of C). Blackwell suggested 
a simple strategy with a geometric flavor; it only requires a bandit monitoring. 

Play an arbitrary Xi. For t ^ 1, given the vector- valued quantities 



mt 



1 * 



or 



mt 



1 * 

= ■7'^m{xs,ys) : 



s=l 
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depending on whether pure or mixed actions are taken and observed, compute the projection ct (in 
£^-norm) of fht on C. Find a mixed action Xt+i that solves the minimax equation 



min max (m* — Ct, m(x, y)) , 



(1) 



where ( • , • ) is the EucUdian inner product in R'^. In the case when pure actions are taken and observed, 
draw At+i at random according to Xt+i. 

The minimax problem used above to determine Xt+i is easily seen to be a (scalar) zero-sum game and 
is therefore efficiently solvable using, e.g., linear programming: the associated complexity is polynomial 
in Nj, and iVg. All in all, this strategy is efficient if the computations of the required projections onto C 
in €^-norm can be performed efficiently. 

The strategy presented above enjoys the following rates of convergence for approachability. 

Theorem 2.2 (Theorem 3 of IBlackwellI jH; Theorem II. 4. 3 of IMertens et alI (2iI |) We de- 
note by M a bound in norm over m, i.e., 



(a,b)eAxB 



\'m{a,b)\\ ^ M . 



With mixed actions taken and observed, the above strategy ensures that for all strategies of the second 
player, with probability 1, 

1 ^ , , 2M 



inf 

CgC 



while with pure actions taken and observed, for all 5 ^ (Oi 1) O'^d for all strategies of the second player, 
with probability at least 1 — 5, 



sup inf 



-^m{At,Bt 



An alternative strategy in the case v^rhere pure actions are taken and observed. Convergence 
rates of a slightly different flavor (but still implying approachability) can be proved, in the full monitoring 
case, by modifying the above procedure as follows. For t ^ 1, consider instead the vector- valued quantity 

1 * 

fht = - '^m{xs,Bs) , 

s=l 

compute its projection ct (in €^-norm) on C, and solve the associated minimax problem ([1]). 

This modified strategy enjoys the following rates of convergence for approachability when pure actions 
are taken and observed. 



Theorem 2.3 (Section 7.7 and Exercise 7.23 of ICesa-Bianchi et al.I j6[) We denote by M a 
bound in norm over m, i.e., 

max \\m(a,b)\\ ^ M . 
{a,b)eAxB" ""^ 

With pure actions taken and observed, the above strategy ensures that for all strategies of the second 
player, with probability at least 1 — 5, 



inf 



I 

-^m{At,Bt 



2M I 



^ -^(l + 2v/ln(2/5) 



In the next section, we will rather resort to this slightly modified procedure as the form of the resulting 
bounds is closer to the one derived in the main section (Section [6]) of this paper. 

3. Robust approachability for finite set-valued games. In this section we extend the results 
from the previous section to set-valued payoff functions in the case of full monitoring. We denote by 
5(K'') the set of all subsets of and consider a set- valued payoff function In: Ay.B ^ 5(K'*). 
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Pure actions taken and observed. At each round t, the players choose simultaneously respective 
actions At (z A and Bt G B, possibly at random according to mixed distributions Xt and y^. Full 
monitoring takes place for the first player: he observes Bt at the end of round t. However, as a result, the 
first player gets the subset rfi{ At, Bt) as a payoff. This models the ambiguity or uncertainty associated 
with some true underlying payoff gained. 

We extend m multi-linearly to /S.{A) x A(K) and even to A(yl x B), the set of joint probability 
distributions o\\ Ax B, as follows. Let 

be such a joint probability distribution; then rri(fi) is defined as a finite convex combinatior0 of subsets 
of R'*, 

"^(m) = X! X! ^) • 

When /i is the product-distribution of some x S A(^) and y E ^{B), we use the notation r?T(/i) = rn{x, y). 
We denote by 

1 ^ 

t=l 

the empirical distribution of the pairs {At,Bt) of actions taken during the first T rounds, and will be 
interested in the behavior of 

T 



T 

t=i 

which can also be rewritten here in a compact way as TO(7rT), by linearity of the extension of m. 

The distance of this set m(7rT) to the target set C will be measured in a worst-case sense: we denote 

by 

Et = sup inf ||c — dW^ 

demiTTT) '^'^^ 

the smallest value such that jri{'KT) is included in an er-neighborhood of C. Robust approachability of a 
set C with the set-valued payoff function m then simply means that the sequence of Et tends almost-surely 
to 0, uniformly with respect to the strategies of the second player. 

Definition 3.1 ^ set C C R'' is m-robust approachable with pure actions if there exists a strategy of 
the first player such that, for all e > 0, there exists an integer Tg such that for all strategies of the second 
player, 

pivT^Tj, sup inf ||c- d||2 < e i 



ceC 



Mixed actions taken and observed. At each round t, the players choose simultaneously respective 
mixed actions Xt £ A(^) and t/j € A(B). Full monitoring still takes place for the first player: he observes 
j/j at the end of round t; he however gets the subset rn{xt,yt) as a payoff (which, again, accounts for the 
uncertainty). 

The product-distribution of two elements x — {xa)aeA £ A(yl) and y — {yb)beB G A(S) will be 
denoted hy x y; it gives a probability mass of Xayb to each pair (a, b) & A x B. We consider the 
empirical joint distribution of mixed actions taken during the first T rounds, 

1 ^ 

I'T = 7f;^xt(E)yt, 



T 

t=i 



and will be interested in the behavior of 

T 



^^m{xt,yt), 

which can also be rewritten here in a compact way as to(i^t), by linearity of the extension of to. 

^For two sets S, T and a G [0, 1], the convex combination aS + (1 — a)T is defined as 

{as + (1 - a)t, s e S and t e T} . 
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Definition 3.2 A set C C R'^ is m-robust approachable with mixed actions if there exists a strategy of 
the first player such that, for all e > 0, there exists an integer such that for all strategies of the second 
player, 

pivT^Te, sup inf ||c-d||2<e> >l-£. 

[ dem{vT) "^'^ J 

Actually, the bounds exhibited below in this setting will be of the form 

sup inf ||c- rf||2 < ,0(T) 

with probability 1 and uniformly over all (deterministic or randomized) strategies of the second player, 
where /3(T) and for deterministic strategies of the first player. 

A useful continuity lemma. Before proceeding we provide a continuity lemma. It can be reformulated 
as indicating that for all joint distributions and v awei Ay.B, the set rn{ii) is contained in a M — 
ncighborhood of m(z/), where M is a bound in .£^-norm on m; this is a fact that we will use repeatedly 
below. 

Lemma 3.1 Let /x and v he two probability distributions over A x B. We assume that the set-valued 
function rn is bounded in norm by M, i.e., that there exists a real number M > such that 

V(a, b) gAxB, sup ||d||2 ^ M . 

dErn{a,b) 

Then 

sup irif c||2 ^ M Wfj, — ^ M^yW^NeWn — 1/112 , 

demin) cem(v) 

where the norms in the right-hand side are respectively the (} and i"^ -norms between probability distribu- 
tions. 

Proof. Let d be an element of m(/i); it can be written as 

d = f^a,b Oa,b 

aeAbeB 

for some elements 6a,b € fn{a,b). We consider 

c = XI ^"■''^ ' 

aeAbeB 

which is an element ofrn{u). Then by the triangle inequality, 

I|rf-C||2 = 



aeAbeB 



< XI XI I'""'*' ~ '^"'''1 w^^-'bh ^ XI X] I'""'*' ~ '^"'''1 ■ 

2 aeAbeB aeAbeB 



This entails the first claimed inequality. The second one follows from an application of the Cauchy- 
Schwarz inequality. □ 

Corollary 3.1 When the set-valued function m is bounded in norm, for all y e ^{B), the mapping 
Dy : A{A) R defined by 

\/x e A{A), Dy{x) = sup inf ||c - dW^ 

dem{x,y) '^^^ 

is continuous. 

Proof. We show that for all x, x' € A(^), the condition ||a;' — a;||i ^ £ implies that Dy{x) — 
Dy{x') < Me, where M is the bound in norm over m. Indeed, fix 5 > and let d^^x S m{x,y) be such 
that 

Dy{x) ^mf\\c-ds,x\\2 + S- (2) 
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By Lemma [3.11 (with the choices ^ = x (E) y and v = x' ® y) there exists ds^x' G mix'^y) such that 
II ds^x — ds^x' II 2 ^ -^^^ ~^ ^- "^^^ triangle inequahty entails that 

inf II c - ds,x II n < inf II c - dg ^' \\ , + Me + 6 . 

Substituting in ([3]), we get that 

Dy{x) «C Me + 2S + M\\c- ds^x' ^ Ms + 25 + Dy{x') , 

which, letting 5 — > 0, proves our continuity claim. □ 

Necessary and sufficient condition for robust approachability. This conditions reads as follows 
and will be referred to as (|RACp . an acronym that stands for "robust approachability condition." 

Theorem 3.1 Suppose that the set-valued function m is bounded in norm by M. A closed convex set 
C C is rn-approachable (with pure or mixed actions) if and only if the following robust approachability 
condition is satisfied, 

yyeA{B), 3xeA{A), rn{x,y)CC. (RAC) 

Proof of the necessity of Condition (jRACp . If the condition does not hold, then there exists 
t/g S A{B) such that for every x £ A, the set rrT{x,yQ) is not included in C, i.e., it contains at least 
one point not in C. We consider the mapping Dy^ defined in the statement of Corollarv l3.1l Since C is 
closed, distances of given individual points to C are achieved; therefore, by the choice of yg, we get that 
Dy^^{x) > for all x £ A(^). Now, since Dy^ is continuous on the compact set A(^), as asserted by the 
indicated corollary, it attains its minimum, whose value we denote by I?min > 0. 

Assume now that the second player chooses at each round y^ — y^ as his mixed action. In the case of 
mixed actions taken and observed, denoting 



1 



T 

t=i 

we get that i^t = xt <E) yg, and hence, for all strategies of the first player and for all T ^ 1, 

sup inf ||c - d\\2 = Dy^{XT) > Anin > , 

which shows that C is not approachable. 

The case of pure actions taken and observed is treated similarly, with the sole addition of a con- 
centration argument. By martingale convergence (e.g., repeated uses of the Hoeffding-Azuma inequality 
together with an application of the Borel-Cantelli lemma), 5t — \\i^t — vt\\i ^ ^ almost surely as T — oo. 
By applying Lemma |3. 11 we get 

sup inf ||c — c?||2 ^ sup irA \\c— d\\^ — M5t^ Dtj^i-^ — M5t 

and simply take the liminf in the above inequalities to conclude the argument. □ 

That (jRACp is sufficient to get robust approachability is proved in a constructive way, by exhibiting 
suitable strategies. We identify probability distributions over A x B with vectors in M-^^'' and consider 
the vector-valued payoff function 

m: {a,b) e Ax B^ S^aM) e ^-^""^ , 
which we extend multi-linearly to A{A) x A{B); the target set will be 

C^{fieA{AxB): m(/i)CC}. (3) 

Since m is a linear function on A{A x B) and C is convex, the set C is convex as well. In addition, since 
C is closed, C is also closed. 

Lemma 3.2 Condition (|RACp is equivalent to the m -approachability of C. 
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Proof. This equivalence is immediate via Theorem 12.11 The latter indeed states that the m- 
approachability of C is equivalent to the fact that for all y e A(S), there exists some x £ A(^) such that 
/I = m(x, y), the product-distribution between x and y, belongs to C, i.e., satisfies rn{^) — rn{x, y) C C. 
□ 

The above definition of m entails the following rewriting, 

1 ^ 1 ^ 

■nr ^ Tf,^'m'{At,Bt) and vt ^ Tf,^m{xt,yt) ■ 



Let denote the projection operator onto C; the quantities at hand in the definition of m-approachability 
of C are given by 



£T = 



TTT - Pc{t^t) = inf_lkT - MII2 
2 A^ec 



and 



£t = 



VT - Pci^r) = inf^ II - /^ll 2 



t^ec 



We now relate the quantities of interest, i.e., the ones arising in the definition of m-robust approachability 
of C, to the former quantities. 



Lemma 3.3 With pure actions taken and observed, 

sup inf \\c ~ d\\^ ^ M ^/N^N^ et ■ 

With mixed actions taken and observed, 

sup inf ||c - d\\^ ^ M^/NaNb e'j. . 



Proof. Lemma l3.ll entails that the sets m{TTT) are included in M V Nj^Nb ffT-neighborhoods of 
rn{^P^{'KT)) ■ Since by definition of C, one has rn{^P^{'KT)) ^ C, we get in particular that the sets rn{-KT) 
are included in M\/N_aNb ET-neighborhoods of C, which is exactly what was stated. The argument can 
be repeated with the i^t to get the second bound in the statement of the lemma. □ 

Proof of the sufficiency of Condition (jRACp . First, Lemma [3?2] shows that Condition (IRAC|) 
(via Theorems 12.21 or 12. 3p ensures the existence of strategies m-approaching C. Second, Lemma 13.31 
indicates that these strategies also m-robust approach C. (It even translates the rates for the m- 
approachability of C into rates for the m-robust approachability of C; for instance, in the case of mixed 
actions taken and observed, the 2/\/T rate for the m-approachability of C becomes a 2M -y/ Nj^Nb / T 
rate for the m-robust approachability of C, a fact that we will use in the proof of Theorem 16.11 ) □ 

Two concluding remarks. Note that, as explained around Equation ([1]), the considered strategies 
for m-approaching C, or equivalently m-robust approaching C, are efficient as soon as projections in 
£^-norm onto the set C defined in ([3]) can be computed efficiently. The latter fact depends on the 
respective geometries of m and C. We will provide examples of favorable cases (see, e.g., Section [6.2.11 
about minimization of external regret under partial monitoring). 

A final remark is that the proposed strategies require full monitoring, as they rely on the observations 
of either the pair of played mixed actions mixt, y^) or of played pure actions m{At, Bt). They enjoy no 
obvious extension to a case where only a bandit monitoring of the played sets rn[xt,yi) or rn{At,Bt) 
would be available. 



4. Robust approachability for concave convex set- valued games. We consider in this section 
the same setting of mixed actions taken and observed as in the previous section, that is, we deal with 
set-valued payoff functions m : A(^) x A{B) — > 5(R'*) under full monitoring. However, in the previous 
section m was linear on A(^) x A(S), an assumption that we now weaken while still having that (|RAC[) 
is the necessary and sufficient condition for robust approachability. The price to pay for this is the loss 
of the possible efficiency of the approachability strategies exhibited and the worsening of the convergence 
rates. 

Formally, the functions m : A{A) x A{B) — > 5(K'^) that we will consider will satisfy one or several of 
the following properties. 
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Definition 4.1 A function rfi : A(^) x A(S) —> 5(M'*) is uniformly continuous in its first argument 

if for all £ > 0, there exists r] > such that for all x,x' G A(y^) satisfying \\x — x'\\^ ^ rj and for all 
y G A(S), the set m(x',y) is included in an e -neighborhood ofm{x,y) in the Euclidian norm. Put 
differently, 

sup inf \\d — c\\2 or m{x' ,y) (Zrn{x,y) + sB , 

dem{x',y) cem(x,y) 

where B is the unit Euclidian ball in . 



Uniform continuity in the second argument is defined symmetrically. 

Definition 4.2 A function m : A{A) x A{B) — > 5(K'^) is concave in its first argument if for all x, x' G 
A(^), all y G A(B), and all a G [0, 1], 

rn{ax + (1 — a)x' , y) C arn{x, y) + (1 — a) m{x' , y) . 

A function In : A{A) x A[B) — J- 5(R'^) is convex in its second argument if for all x G A(^), all 
y, y' G A{B), and all a G [0, f], 

arn{x, y) + (1 — a) rn{x, y') C Tn{x, ay + (1 — . 



An example of such a function m is discussed in Lemma 15.11 

The following theorem indicates that (jRAC|) is the necessary and sufficient condition for the m-robust 
approachability of a closed convex set C with mixed actions when the payoff function m satisfies all four 
properties stated above. (Boundedness of to indeed follows from the continuity of to in each variable.) 

Theorem 4.1 Ifmis bounded, convex, and uniformly continuous in its second argument, then (jRACP 
entails that a closed convex set C is m-robust approachable with mixed actions. 

On the contrary, if rfi is concave and uniformly continuous in its first argument, then a closed convex 
set C can be m-robust approachable with mixed actions only if (jRACp is satisfied. 

Proof of the second statement of Theorem 14.11 The proof of CoroUarv 13.11 extends to the 
case considered here and shows, thanks to the ad hoc consideration of the result stated in Lemma lOI as 
following from Definition l4.1[ that for all y G A(S), the mapping Dy is still continuous over A{A). We now 
proceed by contradiction and assume that (|RAC|) is not satisfied; the first part of the proof of the necessity 
of (jRAC[l in Theorem 13. II also applies to the present case: there exists t/g such that Dy^ ^ i^min > over 
A{A). It then suffices to note that whenever the second player resorts to y^ = y^ at all rounds t ^ 1, 
then for all strategies of the first player, the quantity of interest in robust approachability can be lower 
bounded as follows, thanks to the concavity in the first argument: 



sup < inf llrf — c| 
cec 



I ^ _ ] 
d e -^m(a;t,yo) \ 




> sup-; inf c||2 : d m\ —} Xt, y^ \ ) = Dy^ [t^/ ,^t\ ^ Anin > 0. 



I c 

Therefore, C is m-robust approachable with mixed actions by no strategy of the first player. □ 

The proof of the first statement of Theorem 14.11 relies on the use of approx ima tely calibrated str ate- 
gies of the first player as introduced and studied (among others) bv iDawid 1|, Foster and Vohra §], 
Mannor and Stolt"3 [l9| . Formally, given 77 > 0, an 77-calibrated strategy of the first player considers 



some finite covering of A(B) by Nrj balls of radius 77 and abides by the following constraints. Denoting 
by y-*^ , . . . , y^^ the centers of the balls in the covering (they form what will be referred to later on as an 
77-grid), such a strategy chooses only forecasts in {y^, . . . , y^''}. We thus denote by Lt the index chosen 
in { 1 , . . . , N,-i } at round t and by 

T 

iVT(^) = 5]l{L.=n 
t=l 

the total number of rounds within the first T ones when the element I of the grid was chosen. We denote 
by ( • )+ the function that gives the nonnegative part of a real number. The final condition to be satisfied 
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is that for all ^ > 0, there exists an integer Ts such that for all strategies of the second player, with 
probability at least 1 — 5, for all T ^ T^, 



2-^ T 



1 



V 



(4) 



This calibration criterion is slightly stronger than the classical 77-calibration score usually considered in 
the literature, which consists of omitting nonnegative parts in the criterion above and ensuring that for 
all strategies of the second player, with probability at least 1 — (5, for all T ^ Ts, 



^ Nt{£) 



T 



1 



{Lt=i} 



(5) 



The existence of a calibrated strategy in the sense of ^ however follows from the same approachability- 
based construction studied in Mannor and Stolt^ ijj to get ^ and is detailed in the appendix. In the 
sequel we will only use the following consequence of calibration: that for all strategies of the second 
player, with probability at least 1 — S, for all T > Tg, 



max 
e=i,...,N, 



T 



Nt{1) 



^5. 



(6) 



Proof of the fir st statement of Theorem 14.11 The insight of this proof is similar to the one 



illustrated in iPercheti [22|. We first note that it suffices to prove that for all e > 0, the set Cg defined 
as the e-neighborhood of C is m-robust approachable with mixed actions; this is so up to proceeding in 
regimes r = 1, 2, . . . each corresponding to a dyadic value = and lasting for a number of rounds 
carefully chosen in terms of the length of the previous regimes. 

Therefore, we fix e > and associate with it a modulus of continuity 77 > given by the uniform 
continuity of m in its second argument. We consider an r7/2-calibrated strategy of the first player, which 
we will use as an auxiliary strategy. Since (jRAC[) is satisfied, we may associate with each element of 
the underlying ry/2-grid a mixed action e ^{^) such that rri{x^,y^^ C C. The main strategy of the 
first player then prescribes the use of Xf ~ x^* at each round t ^ 1. The intuition behind this definition 
is that if y^* is forecast by the auxiliary strategy, then since the latter is calibrated, one should play as 
good as possible against y^'; in view of the aim at hand, which is approaching C, such a good reply is 
given by x^* . 

To assess the constructed strategy, we group rounds according to the values £ taken by the Lt; to that 
end, we recall that Nrii) denotes the number of rounds in which y^ was forecast and x^ was played. The 
average payoff up to round T is then rewritten as 



-Y.m{xuyt) = E ( ]^E^(^''y*)l{i*=n 1 



T N„/2 

T 

We denote for all i such that Nt{£) > the average of their corresponding mixed actions y^ by 



1 ^ 

The convexity of m in its second argument leads to the inclusion 

- }^ m{xt,yt) = }^ — — 2^ m{x , yt)l{L,=i} Q 2^ ni{x , y^) . 

t=i 1=1 \ ' t=i / £=1 

To show that the above-defined strategy m-robust approaches ~ C + eB, it suffices to show that for 
all 5 > 0, there exists an integer Tg such that for all strategies of the second player, 

VT^Ti, Jl^m{x',y'^)cC + {e + S)B\;,l-S. 
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We denote by M a bound in ^^-norm on m, i.e., for all x € A(^) and y e the inclusion 

rn{x, y) C_ MB holds. Wc let 5' — S{ri/2) / (A/ and define Tg as the time Ts' corresponding to ([6]). 

All statements that follow will be for all strategies of the second player and with probability at least 
1 - 5' > 1 - (5, for all T T^, as required. For each index £ of the grid, either 5'T/Nt{£) r]/2 or 
S'T/Nrii) > 77/2. In the first case, following dH), ||/ - || =^ v/^ + S'T/Nt{1) ^ m since f] is the 
modulus of continuity for e, we get that 

m{x', y'r) C {m{x', y') + eB) C (C + eB) , 

where we used the definition of x^ to get the second inclusion. In the second case, using the boundedness 
of m, we simply write 

^m{x^yf,)C^MBcI-MB. 
Summing these bounds over £ yields 

where we used the definition of 5' in terms of 5. This concludes the proof. □ 

5. Approachability in games with partial monitoring: statement of the necessary and 
sufficient condition; links with robust app roachability. A r epeated ve ctor-valued game with 



partial monitoring is described as follows (see, e.g.. lMertens et al.l j2ll |. lRustichinil [29[, and the references 



therein). The players have respective finite action sets X and J . We denote hy r : I x J ^ the 
vector- valued payoff function of the first player and extend it multi-linearly to A(I) x A(J'). At each 
round, players simultaneously choose their actions /t G I and Jt £ J, possibly at random according to 
probability distributions denoted by G A (I) and q^ G A(J'). At the end of a round, the first player 
does not observe Jt nor r(Jt, Jt) but only a signal. There is a finite set T-L of possible signals; the feedback 
St that is given to the first player is drawn at random according to the distribution H{It,Jt), where the 
mapping H : I x J ^ A{'H) is known by the first player. 



Example 5.1 Examples of such partial monitoring games are provided by, e.g., Cesa-Bianchi et al. 0/, 



among which we can cite the apple tasting problem, the label- efficient prediction constraint, and the multi- 
armed bandit settings. 

Some additional notation will be useful. We denote by R the norm of (the linear extension of) r, 

R = max rii. 7) L . 

(jj)eixjll '■" 

The cardinalities of the finite sets I, J , and T-L will be referred to as A'x, Nj, and N-u. 

Definition 12.11 can be extended as follows in this setting; the only new ingredient is the signaling 
structure, the aim is unchanged. 



Definition 5.1 Let C C R*^ be some set; C is r -approachable for the signaling structure H if there exists 
a strategy of the first player such that, for all e > 0, there exists an integer such that for all strategies 
of the second player, 



inf 

cec 



^Y.r{It,Jt) 



That is, the first player has a strategy that ensures that the sequence of his average vector-valued payoffs 
converges to the set C (uniformly with respect to the strategies of the second player), even if he only 
observes the random signals St as a feedback. 
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Our contributions. A necessary and sufficient condition for r-approachability with the signaling struc- 
ture H was stated and proved by iPerchet [23]; we therefore need to indicate where our contribution lies. 
First, both proofs are constructive but our strategy can be efficient (as soon as some projection opera- 
tor can be computed effici ently, e.g ., in the cases of external and internal regret minimization described 
below) whereas the one of IPerched (isj relies on auxiliary strategies that are calibrated and that require 
a grid that is progressively refined (leading to a step complexity that is exponential in the number T of 
past steps) ; the latter construction is in essence the one used in Section |4l Second, we are able to exhibit 
convergence rat es. Third, as far as elegancy is concerned, our proof is short, compact, and more direct 
than the one of iPerchetl 23j . which relied on several layers of notations (internal regret in games with 
partial monitoring, calibration of auxiliary strategies, etc.). 



5.1 Statement of the necessary and sufficient condition for appro achability in games with 
partial monitoring. To recall the mentioned approachability condition of IPerchet 23 1 we need some 



additional notation: for all q S A(J'), we denote by H{q) the element in A('H)-^ defined as follows. For 
all i e I, its i-th component is given by the convex combination of probability distributions over H 

i7(q), =i7(z,q)-^g,i/(i,j). 
Finally, we denote by T the convex set of feasible vectors of probability distributions over H: 



A generic element of will be denoted hy a E J- and we define the set- valued function m, for all p £ A (I) 
and a E T, hy 

m{p, a) = {r{p, q') : q' € A(J^) such that H{q') = a] . 



The necessary and sufficient condition exhibited by IPerchet 23| for the r-approachability of C with 



the signaling structure H can now be recalled. In the sequel we will refer to this condition as Condi- 
tion (|APM|) . an acronym that stands for "approachability with partial monitoring." 

Condition 1 (referred to as Condition (APM)) The signaling structure H, the vector-payoff 
function r, and the set C satisfy 

yqeA{J), 3peA(I), yq'eA{J), H{q)=H{q') r{p,q')eC. 

The condition can be equivalently reformulated as 

VcreT, 3peA(T), rn{p,(j)CC. (APM) 

This condition is necessary. The subsequent sections show (in a constructive way) that Condi- 
tion (|APMp is sufficient for r-approachability of closed convex sets C given the signaling structure H. 
That this condition is necessary was already proved in Section 3.1 of lPerchetl 



5.2 Links with robust approachability. As will become clear in the proof of Theorem l6.1[ the key 

in our problem will be to ensure the robust approachability of C with the following non-linear set-valued 
payoff function, that is however concave-convex in the sense of Definition 14.21 

Lemma 5.1 The function 

{p,q) e A(I) X A{J) ^ m{p, H{q)) . 
is concave in its first argument and convex in its second argument. 

Unfortunately, efficient strategies for robust approachability were only proposed in the linear case, not 
in the concave-convex case. But we illustrate in the next example (and provide a general theory in the 
next section) how working in lifted spaces can lead to linearity and hence to efficiency. 

Example 5.2 We consider a game in which the second player (the column player) can force the first 
player (the row player) to play a game of matching pennies in the dark by choosing actions L or M; in 
the matrix below, the real numbers denote the payoff while Jit and <^ denote the two possible signals. The 
respective sets of actions are X = {T, B} and J — {L, M, R}. 
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L 


M 


R 


T 


1 /* 


-I /* 2 


/9 


B 


-1 /* 


1 /* 3 


/9 



In this example we only study the mapping p i— m(p, Jit) and show that it is piecewise linear on A(X), 
thus, is induced by a linear mapping defined on a lifted space. 

We introduce a set A — {Pt, Pb: P1/2} of possibly mixed actions extending the set I — {T, B} of 
pure actions; the set A is composed of 

Pt = 5t, Pb=Sb, and P1/2 = ^^t + ^fe ■ 

Each mixed action in A(I) can be uniquely written as p^ = A (5b + (1 — A) 5t for some A G [0, 1]. Now, 
for A ^ 1/2, first, 

P^ = (2A-1)<5b + (1-(2A-1))pi/2; 

second, by definition of to, 

to(p;„ *) = [1 - 2A, 2A-1]; 
since in particular to(p]^/2i ^) = {0} and m{5B,Jlt) = [—1, 1], we have the convex decomposition 

m{p^,JI,) ^ (2A-l)m(<5s,*) + (l-(2A-l))m(pi/2,*), 
which can be restated as 

Tn{p^, *) = to((2A - 1) ,5b + (1 - (2A - 1)) Pi/2, *) = (2A - 1) to(5b,*) + (l - (2A - 1)) to(pi/2,*) • 
That is, to( • , is linear on the subset of A(I) corresponding to mixed actions p^ with A ^ 1/2. 

A similar property holds the subset of distributions with A ^ 1/2, so that we have proved that to( • , ^) 
is piecewise linear on A(Z). 

The linearity on a lifted space comes from the following observation: to is induced by the linear 
extension to A(yl) of the restriction of to to (see Definition 16 . 1 1 for a more formal statement). 

6. Application of robust approachability to games with partial monitoring: for a particu- 
lar class of games encompassing regret minimization. In this section we consider the case where 
the signaling structure has some special properties described below (linked to linearity properties on lifted 
spaces) and that can be exploited to get efficient strategies. The case of general signaling structures is 
then considered in Section [7] but the particular class of games considered here is already rich enough to 
encompass the minimization of external and internal regret. 

6.1 Approachability in bi-piecewise linear games. To define bi-piecewise linearity of a game, 
we start from a technical lemma that shows that to(p, cr) can be written as a finite convex combination 
of sets of the form rn{p, b), where b belongs to some finite set B C T that depends on the game. Under 
the additional assumption of piecewise linearity of the thus-defined mappings m( ■ , &), we then describe 
a (possibly) efficient strategy for approachability followed by convergence rate guarantees. 

6.1.1 Bi-piecewise linearity of a game A preliminary technical result. 

Lemma 6.1 For any game with partial monitoring, there exists a finite set B d J- and a piecewise-linear 
(injective) mapping $ : — > A(S) such that 

yaeT, VpeA(I), TO(p,a) = ^$6(ct)to(p,6), 

beB 

where we denoted the convex weight vector $(cr) € A(yB) by {^b{o')) f^^jg- 



Proof. Since H is linear on the polytope A(J'), Proposition 2.4 in 
that its inverse application H^^ is a piecewise linear mapping of into the subsets of A (J"). This means 



Rambau and Zieglen 281 implies 
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that there exists a finite decomposition of J- into polytopes {Pi, . . . , Pk} each on which H 
Up to a triangulation (see, e.g.. Chapter 14 in [12]), we can assume that each Pk is a simplex 
Bk C T the set of vertices of Pk; then, the finite subset stated in the lemma is 

K 

B^\jBk, 

k=l 

the set of all vertices of all the simplices. 

Fix any a E J-. It belongs to some simplex Pk, so that there exists a convex decomposition 
a = X^heiSfc ^bb] this decomposition is unique within the simplex Pk- If a belongs to two different 
simplices, then it actually belongs to their common face and the two possible decompositions coincide 
(some coefficients Ah in the above decomposition are null). All in all, with each a E J-, we can associate 
a unique decomposition in B, 

beB 

where the coefficients {^b{cr)) form a convex weight vector over B, i.e., belong to A(S); in addition, 
<i>b(cr) > only if 6 G Bk, where k is such that a & Pk- 

Since H^^ is linear on each simplex Pi, . . . , Pk, we therefore get 

i/-i(a)^^<i>,(a)i7-i(6). 

beB 

Finally, the result is a consequence of the fact that 

m{p,a)=r{p, H-\n)) = rip,Y,M^) H-\h) 

\ bee 

which implies, by linearity of r, that 

m(p,(7) = ^$fc(cr)r(p,H-i(6)) =^$6(CT)m(p,6), 

beB beB 

which concludes the proof. □ 

Remark 6.1 The proof shows that $ is piecewise linear on a finite decomposition of T; it is therefore 
Lipschitz on T . We denote by its Lipschitz constant with respect to the i'^ -norms. 

The main contribution of this subsection (Definition l6.ip relies on the following additional assumption. 
Assumption 6.1 A game is bi-piecewise linear ifrn{ ■ ,b) is piecewise linear on A(Z) for every b £ B. 



Assumption 16.11 means that for all 6 e 6 there exists a decomposition of A (I) into polytopes each 
on which to( ■ ,b) is linear. Since B is finite, there exists a finite number of such decompositions, and 
thus there exists a decomposition to polytopes that refines all of them. (The latter is generated by the 
intersection of all considered polytopes as b varies.) By construction, every rn[ • , 6) is linear on any of the 
polytopes of this common decomposition. We denote by ^ C A(Z) the finite subset of all their vertices: a 
construction similar to the one used in the proof of Lemma lOI (provided below) then leads to a piecewise 
linear (injective) mapping 9 : A(I) — A(^), where 0(p) is the decomposition of p on the vertices of the 
polytope(s) of the decomposition to which it belongs, satisfying 

V6ei5, VpeA(I), m(p,&) = ^ ea(p)m(a,6) , 

aeA 

where we denoted the convex weight vector Q{p) E A(^) by (0a(p))jjg^- This, Lemma |6.1[ and 
Assumption 16 . 1 1 show that on a lifted space, rn coincides with a bi-linear mapping m, as is made formal 
in the next definition. 

Definition 6.1 We denote by m the linear extension to A(yl x B) of the restriction of m to A x B , so 
that for all p E A (I) and a E T , 

rn{p,a) =m(8(p), ^(a-)) . 
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Approaching Strategy in Games with Partial Monitoring 



Parameters: an integer block length L ^ 1, an exploration parameter 7 £ [0, 1], a strategy ^ for m-robust 
approachability of C 

Notation: u G A(X) is the uniform distribution over I, Pjf denotes the projection operator in ^^-norm of R^^'^ 
onto 



Imtialtzation: compute the finite set B and the mapping $ : — > A(S) of Lemma 16. II compute the finite set A 
and the mapping O : A(I) — !> A(^) defined based on Assumption 16.11 pick an arbitrary Oi £ A(^) 

For all blocks n — 1,2, . . ., 

(i) define ccn = 6'„,a a and p„ = (1 - 7) a;„ + 7 u; 

(ii) for rounds t — {n ~ 1)L +1, . . . , nL, 

2.1 draw an action /t £ I at random according to p^; 

2.2 get the signal St; 

(iii) form the estimated vector of probability distributions over signals, 

(iv) compute the projection ct„ — Pjr{an); 

(v) choose On+i = ^(cti), . . . , 0^, <I>(ct„ 



Figure 1: The proposed strategy, which plays in blocks. 



6.1.2 Construction of a strategy to approach C. The approaching strategy for the original 
problem is based on a strategy vj/ for m-approachability of C, provided by Theorem [STj we therefore first 
need to prove the existence of such a ^. 

Lemma 6.2 Under Condition (jAPMp . the closed convex set C is rn-robust approachable. 

Proof. We show that Condition (jRACp in Theorem 13.11 is satisfied, that is, that for all y £ A(S), 
there exists some x £ A(^) such that rn{x,y) C C. With such a given y £ A(S), we associate[l the 
feasible vector of signals a = X^&ee Ubb ^ J- and let p be given by Condition (jAPM[) . so that rn{p, a) CC. 
By linearity of m (for the first equality), by convexity of m in its second argument (for the first inclusion), 
by Lemma l6.ll (for the second and fourth equalities), by construction of A (for the third equality). 



m{eip),y) = ^ea{p)^ybm{a,b) C ^ ea(p) m(a, cr) = ^ ea(p) ^ $6(cr) m(a, 6) 

aGA beB aGA aeA beB 

= ^^b{a)rn{p,b) ^m{p,a) CC , 

beB 

which concludes the proof. □ 

We consider the strategy described in Figure[TJ It forces exploration at a 7 rate, as is usual in situations 
with partial monitoring. One of its key ingr edient, that c onditionally unbiased estimators are available, 
is extracted from Section 6 in the article by iLugosi et al. lij : in block n we consider sums of elements 
of the form 



Pit, 



(i,s)eixn 



averaging over the respective random draws of It and St according to p„ and H{It, Jt), i.e., taking the 
conditional expectation Et with respect to p„ and Jt, we get 

¥.t[Ht]^H{6j,). (7) 



■^Note however that we do not necessarily have that ^((t) and y are equal, as <& is not a one-to-one mapping (it is injective 
but not surjective). 
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Indeed, the conditional expectation of the component i of Ht equals 



^{St=s}Hh=i} 

Pit.n 



H{l,Jt) 

Pi,n 



where we first took the expectation over the random draw of St (conditionally to p„, Jt, and It) and 
then over the one of It- Consequently, concentration-of-the- measure arguments can show that for L large 
enough, 

^ nL ^ nL 

an = — ^ Ht is close to H{q„) , where q„ = — ^ S,j^ . 

t={n-l)L+l t={n~l)L+l 

Actually, since C A{7i)^, we have a natural embedding of into R^^-^ and we can define Pj^, the 
convex projection operator onto T (in i?^-norm). Instead of using directly an, we consider in our strategy 
f?ri = Pj^^ifn), which is even closer to H{q^. 

More precisely, the following result can be extracted from the proof of Theorem 6.1 in lLugosi et alJ [l6| . 
The proof is provided in Appendix |B] 



Lemma 6.3 With probability 1 ~ S, 



6.1.3 A performance guarantee for the strategy of Figure [TJ For the sake of simplicity, we 
provide first a performance bound for fixed parameters 7 and L tuned as functions of T. Adaptation to 
T — > (X) is then described in the next section; note that it cannot be performed by simply proceeding in 
regimes, as the approachability guarantees offered by the second part of the theorem are only at time 
round T. (This is so because the considered strategy depends on T via the parameters 7 and L.) 

Theorem 6.1 Consider a closed convex setC and a game {r, H) for which Condition (|APM[) is satisfied 
and that is bi-piecewise linear in the sense of Assumvtion 16'. Jl Then, for all T ^ 1, the strategy of 
Figure]^ run with parameters 7 € [0, 1] and L ^ 1 and fed with a strategy 'J for rn -approachability of C 
(provided by Lemma \6.2fl is such that, with probability at least 1 — 5, 



inf 

cec 



^jZ^{It,Jt. 



2L 



I H{2T)/{LS)) 
T 



2-iR 



2R 



^T/L - 1 



Rk^^NiNhNa 



1 2Ni , 2NiNhT 1 Ni , 2NxN-hT \ 



In particular, for all T ^ 1, the choices of L = ["T"^/^] and 7 = T imply that with probability at least 

:-l^r(/„J,)^ 

2 



inf 

cec 



^ □ (r-i/5jin| + r-2/5in^ 



for some constant □ depending only on C and on the game (r, H) at hand. 



The efficiency of the strategy of Figure [T] depends on whether it can be fed with an efficient approacha- 
bility strategy which in turn depends on the respective geometries of m and C, as was indicated before 
the statement of Theorem 13.11 (Note that the projection onto can be performed in polynomial time, 
as the latter closed convex set is defined by finitely many linear constraints, and that the computation of 
A, B, and m can be performed beforehand.) In any case, the per-round complexity is constant (though 
possibly large). 

Proof. We write T as T = NL + k where N is an integer and ^ fc ^ L — 1 and will show 
successively that (possibly with overwhelming probability only) the following statements hold. 

1 ^ ^ 

if^T{h,Jt) is close to ]vl ^ ' '^^^ 

t=i t=i 
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1 ^ 

n— 1 

1 J2 r{Xn, Qn) = ^ E E ''(°' 5n) 

ri=l ae^l 

ri=l aeA 

0„,$(i7(9„)) 



is close to 



N 



n=l 



N 



ri=l 
1 ^ 

— ^r(a;„, q„); (10) 

n—l 
1 ^ 

1 ^ / ~ \ 

qualtotheset 1^ E ^( ) 5 

1 ^ 

-^Tf!(0„,$(a„)); (11) 



is close to 



belongs to the set 



IS C' 



is close to the set 



1 ^ 

— ^ m^0„, $((T„)^ is close to the set C ; 



(12) 



where we recall that the notation q„ was defined above and is referring to the empirical distribution 
of the Jt in the n-th block. Actually, we will show below the numbered statements only. The first 
unnumbered statement is immediate by the definition of a;„, the linearity of r, and the very definition of 
m; while the second one follows from Definition 16.11 

1^ ~ 1^ ~ l^/~\ 

-^^0„,,m(a, H{q^)) = - ^ ^ 0„.„ (^(qj) m(a, 6) = '^{H{Qn)))- 

n=laeA n=l {a,b)eAxB n=l ^ ^ 

Step 1: Assertion (|5]). A direct calculation decomposing the sum over T elements into a sum over 
the NL first elements and the k remaining ones shows that 

T NL 



^ R 



k / 1 1 



2k 2L 
NL] =—R^—R. 



Step 2: Assertion (jO]). We note that by defining Ef the conditional expectation with respect to 
(/i, 5*1, Ji), . . ., (/t-i, St~i, Jt-i) and Jt, which fixes the values of the distribution p'^ of /( and the value 
of Jt, we have 

Et[riIt,Jt)] =r{p[,Jt). 

We note that by definition of the forecaster, p[ = p„ if t belongs to the n-th block. By a version of the 
H oeffding-Azuma in equality for sums of Hilbert space- valued martingale differences stated a^ Lemma 3.2 
in 



Chen and White we therefore get that with probability at least \ — 5 

^ NL ^ N 

y2riIt,Jt)~-J2r{p^„q. 



^ AR 



\n{2/S) 



T 



Step 3: Assertion PH)) . Since by definition p„ = (1 — 7) a;„ + 711, we get 



n—l n—l 



27i?. 



Step 4: Assertion pT|) . We fix a given block n. Lemma l673l indicates that with probability 1 — 5, 

(13) 



1 2Ni 2NxNu 1 Ni 2NxN-h \ 
jL S 3 jL 



S I 



Since $ is Lipschitz (see Remark 16. ip . with a Lipschitz constant in ^^-norms denoted by k$, we get that 
with probability 1—6, 



$ (ct„) -<P{H (q„) K$ VNiNh 



l2Nx , 2NxNn 1 Nx , 2NxNh \ 
In ! m 

7L 5 3"fL 5 ) 



'Together with the fact that v^e"" s£ 6""/^ foj. „ j> 0. 
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By a union bound, the above bound holds for all bloeks n = 1, . . . , N with probability at least 1 — NS. 
Finally, an application of Lemma 13.11 shows that 



1 ^ / ~ \ 1 ^ 

]V ^^(^"' *(-^(^n)) ) is in a eT-neighborhood (in f-norm) of ]^ *(^")) ' 



n=l 

where 



2Ni , 2NxN-H 1 Nx , 2NxNn \ 



Step 5: Assertion ([T2| . Since C is m-robust approachable and by definition of the choices of the 
On in Figure [U we get by (the proof of the sufficiency part of) Theorem 13.11 with probability 1, 



inf 

cec 



1 x^- 



2i? 2R 



since T/L N + k/L i^N + 1. 



Conclusion of the proof. The proof is concluded by putting the pieces together, thanks to a triangle 
inequality and by considering LS/T ^ S/{N + 1) instead of 6. □ 

6.1.4 Uniform guarantees over time for a time-adaptive version of the strategy of Figure[Tl 

We present here a variant of the strategy of Figure [T] for which the lengths i„ of blocks n and the 
exploration rates 7„ are no longer constant. To do so, we need the following generalization of Theorem l2.2l 
to polynomial averages; this result is of independent interest. We only state the result for mixed actions 
taken and observed, but the generalization for pure actions follows easily. 



Consider the setting of Theorem 12.21 The studied strategy relies on a parameter a ^ 0. It plays an 
arbitrary Xi. For t ^ 1, it forms at stage t + 1 the vector- valued polynomial average 



* s=l s=l 

computes its projection cf onto C, and resorts to a mixed action a^t+i solving the minimax equation 

min max (fh? — cf, m(x,y)) . 

xeA(.A) yeA{B) \ * * ' ^ '"^^ 

Theorem 6.2 We denote by M a bound in norm over m, i.e., 

max mia, 6) L ^ Af . 

(a,6)e^xB" 

For all a ^ 0, when C is an approachable closed convex set, the above strategy ensures that for all strategies 
of the second player, with probability 1, for all T ^ I, 



inf 

c6C 



1 

'^t°'m{xt,yt) 



/ELi*'" 2MK^ , , 

2M " ' ^ , 14 



where Ka is a constant depending only a. 



It is interesting to note that the convergence rate are independent of a and are the same as standard 
approachability {1/\/T). 



Proof. The proof is a slight modification of the one of Theorem 12.21 We denote by df the squared 
distance of fhf to C, 

= inf II c - TO^ II ^ = II - TO^ II ^ 
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and use the shortcut notation rrit = ■m{xt, for all t ^ 1. Then, 



fjo < II f^a _ a 



(mt+i - m^) 



-{fnt -Ct, rnt+i - m^) + 



, 2(t+l)" ^„ 



I nit+i - 



^t~{-l 



^0 



2(t+l)"\ /(t+1)" 



where we used in the third inequality the same convex projection inequality as in the proof of Theorem 

The first inequality in ([T4| then follows by induction: the bound 2M for t = 1 is by boundedness of 
TO. If the stated bound holds for c?", then 



1 - 



2(i+l)"\ /(i + 1)" 



as desired, since 



2(t + 1)" 



1 



(Tf)'-(t+l)2« 1 

5^ 



The second inequality in ([T4| can be proved as follows. First, for all a 0, by comparing sums and 
integrals, we get that for alH ^ 1, 



t 



a+l 



a+1 



J s" ds s; ^ s" y ds 



(t + l)"+i (2t)"+i 



a + l 



a + l 



Therefore, 



for 



a+l I —r 

x/2^m 



This concludes the proof. 



□ 



The extension to polynomially weighted averages can also be obtained in the context of robust ap- 
proachability as the key to Theorem 13. II is Lejrima l3.31 which indicates that to get robust approachability, 
it suffices to approach, in the usual sense, C; both can thus be performed with polynomially weighted 
averages. 

Consider now the variant of the strategy of Figure[T]for which the length of the n-th block, denoted by 
L„, is equal to n", the exploration rate on this block comes at a rate 7„ — ti^"/'^ and 5* is an TO-robust 
approachability strategy of C with respect to polynomially weighted averages with parameter a = 3/2. 
We call it a time-adaptive version of this strategy; note that it does not depend anymore on any time 
horizon T, hence guarantees can be obtained for all T. 



Theorem 6.3 The time- adaptive version of the strategy described in Figure{^ (with Ln = n" and "fn = 
71^"/"^ for a = 3/2J ensures that, for all T ^ 1, with probability at least 1 — 6, 



inf 

cec 



t=i 



^ □ (r-i/5jin|+T-2/5ln^ 



for some constant □ depending only on C and the game (r, H) at hand. 
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Proof. The proof follows closely the one of Theorem 16.11 We choose N so as to write T ~ + k 
where ^ A; ^ Ln+i — 1- We adapt step 1 as follows, 



1 1 



1 1 



^R\-+\—--\n\=—R^ 



T T 



2k 



2L 



N+l 



T 



T 



R. 



Second, as in step 2, we resort again to the Hoeffding-Azuma inequality for sums of Hilbert space- valued 
martingale differences; with probability at least 1 — 5, 



N t= 



t=l 



N „ = i 



/ln(2/<5) ^ /ln(2/<5) 



T 



N 



T 



In view of the choice 7„ ~ n step 3 translates here to 



1 ^ 

— T 



1 ^ 

— T 



^ 2R 



l^n=l "■ in _ r, n Z^„=l 



= 2R- 



N 



= 2R- 



(2a/3) 



N 



rpa 



The same argument as the one at the beginning of the proof of Theorem 16.11 shows that 



N 



N 



-L^n"rK,q„) G J- |] mk„ $(i/(g„)) ) . 

Step 4 starts also by an application of Lemma [6731 together with the Lipschitzness of $ to get that for all 
regimes n = 1, . . . , A^, with probability at least I ~ 6, 



2Ni , 2NxNu 1 Ni , 2NiNh 
■ m 1 In 



InLr 



3 InLr 



By a union bound, the above bound holds for all regimes n = 1, . . . , N with probability at least 1 — NS. 
Then, an application of Lemma 13.11 shows that 

1 " / ~ \ 1 ^ 

— ^ n"7l( 0„, $(i/(q„)) ) is in a eriv-neighborhood of — ^ M(0„, $(ct„) j , 



where, substituting the values of L„ = and 7„ — n^"/^, 



InLn 6 3 7„L„ 6 J 



p(2a/3) 

JV 

rpa 



2A.xln^^ + ^^ln2^-^-^ 



rpa o 
-L N 'J 



It then suffices, as in step 5 of the original proof, to write the convergence rates for robust approachability 
guaranteed by the strategy ^E". By combining the result of Lemma [5751 with Theorem 16. 21 and Lemma [231 
we get 

2RKa, 



inf 

cec 



1 ^ 



■VNaNb- 



Putting all things together and applying a union bound, we obtain that with probability at least 1 — i5. 



inf 

cec 



o 



'(^ + 1)" , IHN/S) , Tj^'"/'^ T^'"/'^ 



T 



rpo 



-N 

rpa 



rpa 







(a/3) 



1 ^ 



N 



Since (as proved at the end of Theorem |672l) - A^+V(/3 + 1) for all /3 ^ 0, we get that 



Nr^ {{a + l)T) 



1/(q + 1) 



and 



-N 



N_0+ 



^ ^rp(l3+l)/(a+l) 



where Ha,p is a constant that only depends on a and (3. Choosing a = 3/2 and substituting these 
equivalences ensures the result. □ 
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6.2 Application to regret minimization. In this section we analyze external and internal regret 
minimization in repeated games with partial monitoring from the approachability perspective. We show 
how to — in particular — efficiently minimize regret in both setups using the results developed for vector- 
valued games with partial monitoring; to do so, we indicate why the assumption of bi-piecewise linearity 
(Assumption 16. ip is satisfied. 

6.2.1 E xternal regret. We consider in this section the f ramework and aim introduce d by 
Rustichini 29 1 and stud ied, sometimes in sp ecial cases, by JPiccolboni and Schindelhaueii |26j . 



Mannor and ShimkinI [17[, ICesa-Bianchi et al.l [6|, iLugosi et al.l |16l |. We show that our general strat 



egy can be used for regret minimization. 

Scalar payoffs are obtained (but not observed) by the first player, i.e., d = 1: the payoff function r is 
a mapping I x J' R; we still denote by i? a bound on |r|. We define in this section 

1 ^ 

as the empirical distribution of the actions taken by the second player during the first T rounds. (This 
is in contrast with the notation qrp used in the previous section to denote such an empirical distribution, 
but only taken within regime n.) 

The external regret of the first player at round T equals by definition 

, T 

where p : A(I) x is defined as follows: for all p £ A(I) and a £ 

p{p, a) = min |r(p, q) : q such that H{q) = o-j . 

The function p is continuous in its first argument and therefore the supremum in the defining expression 
of i?^' is a maximum. 

We recall briefiy why, intuitively, this is the natural notion of external regret to consider in this case. 
Indeed, the first term in the definition of i?™' is (close to) the worst-case average payoff obtained by the 
first player when playing consistently a mixed action p against a sequence of mixed actions inducing on 
average the same laws on the signals as the sequence of actions actually played. 

The following r esult is an easy consequence of Theorem l6.3[ as is explained below; it corresponds to the 
main result of Lugosi et al. jl6.] . with the same convergence rate but with a different strategy. (However, 



Section 2.3 of I Per^hetl M exhibited an efiicient strategy achieving a convergence rate of order T-Za,' 
which is optimal; a question that remains open is thus whether the rates exhibited in Theorem 16.31 could 
be improved.) 

Corollary 6.1 The first player has a strategy such that for all T and all strategies of the second player, 
with probability at least \ — 5, 



for some constant □ depending only on the game (r, H) at hand. 

The proof below is an extension to the setting of partial monitoring of the original proof and strategy 
of lBlackwelll [3 for the case of external regret under full monitoring: in the latter case the vector-payoff 



function r and the set C considered in our proof are equal to the ones considered by Blackwell. 

Proof. We embed T into M-^^^ so that in this proof we will be working in the vector space 
We consider the closed convex set C and the vector- valued payoff function r respectively 



M'^ = M X 1^ 
defined by 



C = < (z, cr) e M X J" : z ^ max p{p, a) > and Lihj) = 

peA(i) 



His,) 
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for all el X J. 

We first show that Condition (|APMp is satisfied for the considered convex set C and game {r_,H). To 

do so, by continuity of p in its first argument, we associate with each q e A(J') an element (j){q) G A(I) 

such that ^ 

0(q) e argmaxp(p, H{q)) . 
peA(i) 

Then, given any q E A(J'), we note that for all q' satisfying H(q') = H{q), we have by definition of p, 

rim, q') ^ P{.m, H{q')) - max , 

peA(i) 

which shows that ri<f>{q), q') G C. The required condition is thus satisfied. 

We then show that Assumption 16 . II is satisfied. To do so, we will actually prove the stronger property 
that the mappings rn{ ■ , a) are piecewise linear for all a E we fix such a ti in the sequel. Only the 
first coordinate r of r depends on p, so the desired property is true if and only if the mapping mi ( • , a) 
defined by 

p e A(I) I — > TOi(p, a) = |r(p, q) : qe A{J) such that H{q) ^ ct| 
is piecewise linear. Since H is linear, the set 



e A{J) such that H{q) = cr | 



is a polytope, thus, the convex hull of some finite set {q^ ^, . . . , g^r ^ A{J). Therefore, for every 
p £ A(I), by linearity of r (and by the fact that it takes one-dimensional values). 



mi(p,CT) = co|r(p,q^i), r(p,q'^jv^)| 



min r(p, fe) , mux r{p, q^j,, 

ke{l....M} ' k'e{l,..,M} 



(15) 



where co stands for the convex hull. Since all applications r{ - ,q^ j.) are linear, their minimum and their 
maximum are piecewise linear functions, thus mi(-,(T) is also piecewise linear. Assumption 16.1 1 is thus 
satisfied, as claimed. 



Theorem 16.11 can therefore be applied to exhibit the convergence rates; we simply need to relate the 
quantity of interest here to the one considered therein. To that end we use the fact that the mapping 

a Cz J- I — > max p(p, a) 
peA(i) 

is Lipschitz, with Lipschitz constant in ^^-norm denoted by Lp; the proof of this fact is detailed below. 

Now, the regret is non positive as soon as Y^J^iL{It, Jt)/T belongs to C; we therefore only need to 
consider the case when this average is not in C. In the latter case, we denote by {rT^^r) its projection 
in €^-norm onto C. We have first that the defining inequality of C is an equality on its border, so that 



and second, that 



m 



max p 
peA(i) 



max p 
peA(i) 



rx = max p{p,(Jt); 
peA(i) 



{p,H{qT.))--Y,r{IuJt) 
{p,H{qj) 



t=i 



max p n, CT'T 
peA(i) '^^ ' 



(Jt — H{qrp 
nax{Lp, l} 



\/2max|ip,l} inf 



1 ^ 

rT--Y.'^{IuJt) 

t=l 

t=i 
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The claimed rates are now seen to follow from the ones indicated in Theorem 

It only remains to prove the indicated Lipschitzness. (All Lipschitzness statements that follow will be 
with respect to the £^-norms.) We have by Definition 16. II that for all p e A(Z) and a ^ J-, 

pip, cr) = min Mi (p, $((t)) , 

where the linear mi is indifferently either relative to mi or is the projection onto the first component of 
the function m relative to m. By Remark 16. II the mapping a ^ J- ^ ) is K$-Lipschitz; this entails, by 
Lemma Is. 11 that for all p £ A(X), the mapping cr € i— > p{p,a) is Ry/Ng K$-Lipschitz. In particular, 
since the latter Lipschitz constant is independent of p, the mapping 

a <E J- I — > max p{p, a) 
peA(i) 

is Ry/Ns K$-Lipschitz as well, which concludes the proof. □ 



A similar argument to the one in iPerchetl [24| shows that the convex set C is defined by a finite 
number of piecewise linear equations, it is therefore a polyhedron so the projection onto it, and as well 
the comp utation o f the strategy, can be done efficiently. We sketch the argument below, and refer the 
reader to Perchetl (23 | for details. Equation ([T5t indicates a priori that for each a € T, there exist a 
finite number Mo- (depending on a) of mixed actions i, . . . , Qa-.M^ such that for all p G A(I), we have 
p{p, a) = min|r(p, . . . , r{p, m„)}- t)y an argument stated in Perchetl 24|, 



|q e A{J) such that H{q) = cr | 



evolves in a piecewise linear way and thus there exist a finite number M of piecewise linear functions 
^ ^ I'a with A; = 1, . . . , M, such that, for all c G J^, 

(There can be some redundancies between the q'„ j.-) Because of this, we have that for all p € A(T) and 

p{p, cr) = min{r(p, q'„^^), r{p, q'„^M)] ■ 
Each function ^ q'^y ^ being piecewise linear, one can construct a finite set {pi, . . . ,Pi^} C A(T) such 
that, for any a G the mapping p i— > p{p,a) is maximized at one of these p^. The convex set C is 
therefore defined by a finite number of piecewise linear equations, it is therefore a polyhedron; therefore 
the projection onto it, hence the computation of the proposed strategy, can be done efficiently. 



6.2.2 Internal / swap regret. [Foster and Vohral [10[ defined internal regret with full monitoring 



as follows. A player has no internal regret if, for every action i G I, he has no external regret on the 
stages when this specific action i was played. In other words, i is the best response to the empirical 
distribution of action of the other player on these stages. 

With partial monitoring, the first player evaluates his payoffs in a pessimistic way through the function 
p defined above. This function is not linear over A(I) in general (it is concave), so that the best 
responses are not necessarily pure actions i g I but mixed actions, i.e., elements of A(I). Following 
iLehrer and Solan 15 1 one therefore can partition the stages not depending on the pure actions actually 



played but on the mixed actions p^ € A(Z) used to draw them. To this end, it is convenient to assume 
that the strategies of the first player need to pick these mixed actions in a finite (but possibly thin) 
grid of A(I), which we denote by {Pg, g G 5}, where Q is a. finite set. At each round t, the first 
player picks an index Gt (z G and uses the distribution p^ to draw his action It- Up to a standard 
concentration-of-the-measure argument, we will measure the payoff at roimd t with r{pQ^^Jt) rather 
than with r{It, Jt)- 

For each g <^Q,we denote by Nxig) the number of stages in {1, . . . , T} for which we had Gt = g and, 
whenever Nxig) > 0, 

We define qrp g is an arbitrary way when Nxig) = 0. The internal regret of the first player at round T is 
measured as 
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Actually, our pro of technique rather leads to the minimization of some swap regret (see 
Blum and Mansour 3| for the definition of swap regret in full monitoring) : 



gee 



T \ 



Again, the following bound on the swap regret easily follows from Theorem l6.11 the latter constructs a 
simple and di rect strategy to co ntrol the swap regret , thus also the internal regret. It therefore improves 



the results of Lehrer and SolanI 15 1 and Perchet 22 1, two articles that presented more involved and less 



efficient strategies to do so (strategies based on auxiliary strategies using grids that need to be refined 
over time and whose complexities is exponential in the size of these grids; ideas all in all similar to what 
is done in calibration, see the references provided in Section|4]). Moreover, we provide convergence rates. 

Corollary 6.2 The first player has an explicit strategy such that for all T and all strategies of the 
second player, with probability at least 1 — 5, 



R'^^P < □ ( T-^'\j\n ^ + 111 1 



for some constant □ depending only on the game (r, H) at hand and on the size of the finite grid Q . 

Proof. The proof of this corollary is based on ideas similar to the ones used in the proof of 
Corollary 16. 11 Q will play the role of the action set of the first player. The proof proceeds in four steps. 
In the first step, we construct an approachability setup and show that Condition (jAPMp applies. In the 
second step, we show that Assumption 16 . II is satisfied. In the third step we analyze the convergence rates 
of the swap regret. In the fourth and final step, we show that the set we are approaching possess some 
smoothness properties by providing a uniform Lipschitz bound on certain functions. 

Step 1: We denote by 

-^cone = {Act, ct e J", A e R+} 

the cone generated by F and extend linearly p : A{I) x T ^ M. into a mapping p : A{I) x Jcone — > i? as 
follows: for all p € A(Z), for all A with A 7^ 1, and all a E 



p{p, Act) = 



if A = 0, 

Xp{p,a) if A > 0. 



In the sequel, we embed J-"conc into M-^^^. 

The closed convex set C and the vector-valued payoff function r are then respectively defined by 

C = \{Zg,Vg)gizg e {RX J^conc)^ ■ yg^G, Zg ^ '^a.Xp(p,,Vg] 

and, for all {g,j) x J, 

^ r{Pg,j)hg'=g} 



H{S,)l{g'=g} 



J g'eg 



To show that C is r-approachable, we associate with each q G A(J') an element g*{q) € G such that 

g*{q) e argmaxp(pg, H{q)) . 

g&g 

Then, given any q e A(J'), we note that for all q' satisfying H{q') = H{q), the components of the vector 
r(^g*{q), q') are all null but the ones corresponding to g*{q), for which we have 

r{pg.(q),q') ^ p{pg.(q),H{q')) ^ p{pg.^qyH{q)^ ^ ma^ p(^Pg, , H {q)'^ ^ ma^ p(pg, , H {q' 

where the first inequality is by definition of p. Therefore, Zl(5*(g), 9') G C. Condition (jAPMp in 
Lemma 16.21 and Theorem 16. II is thus satisfied, so that we have approachability. 
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Step 2: We then show that Assumption 16.11 is satisfied. It suffices to show that for all cr G J^, the 
application 

^ = i^a)ges ^ A{g) i — > mi{TT,(j) = { q))^^^ : q G A(J^) such that H{q) = crj 

is piecewise linear (as the other components in the definition of to are linear in tt). This is the case since 
for each 5, the application 

TT e A(g) I — > IvTg r{pg, q): qe A{J) such that H{q) = ct| 

is seen to be piecewise linear, by using the same one-dimensional argument as in the proof of Corollarv l6.ll 

Step 3: We now exhibit the convergence rates. In view of the form of the defining set of constraints 
for C, the coordinates of the elements in C can be grouped according to each g € G and projections onto 
C can therefore be done separately for each such group. The group g of coordinates of X]t=i L{Gti Jt)/T 
is formed by 



Nrig) 



when 



T 

Nrjg) 
T 



and 



NT{g) 

T 

NAg) 

T 



we denote these quantities by rT^g and VT,g- Otherwise, we project this pair on the set 



Cg = { {Zg.Vg) £ R X J^c. 



and denote by rT,g and VT.g the coordinates of the projection; they satisfy the defining inequality of Cg 
with equality, 

^T,g = maxp(pg,,7;g) . 

By distinguishing for each g according to which of the two cases above arose (for the first inequality) , 
we may decompose and upper bound the swap regret as follows, 



Nrig) ( ( T-Ti^ 
2^^^[^I^P[Pa'^H{qr, 



gee 

geg 



maxplPg,, ——H{qj.J 



\g'eg 



E 

geg 



T 

Nrig) 



^\Pg^lT,g 
Nrig) 



T 



E 

geg 



Nrjg) 
T 



■iPg''^T,g) 



E^p 

geg 



Nrjg) 
T 



H{qT,g) - Vg.T 



+ E 

2 geg 



Nrjg) 
T 



where we used a fact proved below, that the application 

V e -Fconc I — > max p(p ,,v) 

g'eg 

is Lp-Lipschitz. In the last inequality we had a sum of ^■^-norms, which can be bounded by a single 
£2-norm, 



iJ^'^P «C max{Lp,l} v/2iVe 



rT,g 
""T^g 



max|L„,l| j2Ng inf 
^ ' cec 



-^E£(/t,JO 

J gee *=i 



where we denoted by Ng the cardinality of Q. Resorting to the convergence rate stated in Theorem 
concludes the proof, up to the claimed Lipschitzness, which we now prove. (All Lipschitzness statements 
that follow will be with respect to the ^^-norms.) 
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Step 4: To do so, it suffices to show that for all fixed elements p € A (I), the functions v e J-"cone '—>■ 
p{p, v) arc Lipschitz, with a Lipschitz constant Lp that is independent of p. Note that we already proved 
at the end of the proof of Corollary 16.11 that a E J- t-^ p{p, cr) is Lipschitz, with a Lipschitz constant Lp 
independent of p. Consider now two elements v, v' g J^conej which we write 'as v = Xa and v' — AV, 
with (J, a' € F and A, A' S R+. Using triangle inequalities, the Lipschitzness of p on and the fact that 
r thus p are bounded by R, 

|p(p,Aa)-p(p,AV)| ^ \\{p{p,<j)- p{p,a'))\ + \{\-\')p{p,a')\ 

< Aip||o--(T'||2 + i?|A- A'l 

< Lp II Act - AV + (A' - \)a' || ^ + i? |A - A'| 
^ Lp II Ad - A'd' II 2 + (i? + LpiVi) |A-A'| , 

where we used also for the last inequality that since cr is a vector of Ni probability distributions over 
the signals, ||cr||2 ^ \\<^\\i = Ni. To conclude the argument, we simply need to show that |A — A'| can be 
bounded by || Act — A'cr' || ^ up to some universal constant, which we do now. We resort again to the fact 
that \\<j\\^ — — Nx and can thus write, thanks to a triangle inequality and assuming with no loss 

of generality that A' < A, that 



|A - A'l = ^(a M, A' \\a%) ^ ^ IIAa - ^ ^^^^ \\Xa ~ AV'H^ , 

where we used the Cauchy-Schwarz inequality for the final step. One can thus take, for instance. 



Lp = Lp+(i? + LpiVi)y^. 

This concludes the proof. □ 

7. Approachability in the case of general games with partial monitoring. Unfortunately, as 
is illustrated in the following example, there exist games with partial monitoring that are not bi-piecewise 
linear. 

Example 7.1 The following game (with the same action and signal sets as in Examvle \5.^) is not bi- 
piecewise linear. 



L M R 



T (1,0,0,0)/* (0,0,1,0)/* (2,0,4,0) /9 
B (0,1,0,0)/* (0,0,0,1)/* (0,3,0,5) /9 



Proof. We denote mixed actions of the first player by {p,l — p), where p G [0,1] denotes the 
probability of playing T and 1 — p is the probability of playing B. It is immediate that m[{p, 1 — p), *) 
can be identified with the set of all product distributions on 2 x 2 elements with first marginal distribution 
{p, 1 — p). The proof of Lemma 16.11 shows that the set B associated with any game always contains the 
Dirac masses on each signal; that is, Sjf, & B. But for p ^ p' and A S (0, 1), denoting p = Xp + [\ — X)p' , 
one necessarily has that 

m{{p,l-p), ^) £ Am((p, 1 -p), *) + (1 - A)to((|5',1 -p'), *) ; 

the inclusion C holds by concavity of m in its first argument (Lemma 15. ip but this inclusion is always 
strict here since the left-hand side is formed by product distributions while the right-hand side also 
contains distributions with correlations. Hence, bi-piecewise linearity cannot hold for this game. □ 

However, we will show that if Condition (jAPM[) holds there exist strategies with a constant per-round 
complexity to approach polytopes even when the game is not bi-piecewise linear. That is, by considering 
simpler closed convex sets C, no assumption is needed on the pair (r, H). 

We will conclude this section by indicating that thanks to a doubling trick. Condition (jAPM|) is still 
sufficient for approachability in the most general case when no assumption is made neither on (r, H) nor 
on C — at the cost, however, of inefficiency. 
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7.1 Approachability of the negative orthant in the case of general games. For the sake of 
simplicity, we start with the case of the negative orthant K1. Our argument will be based on Lemma |6. II 
we use in the sequel the objects and notation introduced therein. We denote hy r — {rk)is^ksid the 
components of the d-dimensional payoff function r and introduce, for all k £ {1, . . . ,d}, the set-valued 
mapping fhk defined by 

fhk : (p, b) e A(I) xB ^ mk{p, h) = \rk{p, q) : qe A{J) such that H{q) = b} . 

The mapping rh is then defined as the Cartesian product of the fhk', formally, for all p G A(I) and 6 G S, 

m(p, 6) = : V/s G {1, . . . , d}, ZfeGmfe(p, &)|. 

We then linearly extend this mapping into a set- valued mapping rh defined on A (I) x A{B) and finally 
consider the set-valued mapping rh defined on A(X) x by 

Vo-GJ", VpGA(Z), m(p, cr) = m(p, $(o-)) = ^$b((T)m(p, 5) , 

beB 

where $ refers to the mapping defined in Lemma 16.11 (based on rh) . The lemma below indicates why rh 
is an excellent substitute to rh in the case of the approachability of the orthant Rl . 

Lemma 7.1 The set-valued mappings rh and rh satisfy that for all p G A(I) and a £ T , 
(i) the inclusion rh(p, a) C m(p, a) holds; 
(a) ifrh{p,a) C Ml, then one also has m(p, cr) C M^. 

The interpretations of these two properties are that: 1. m-robust approaching a set C is more difficult 
than TO-robust approaching it; and 2. that if Condition (jAPM[) holds for m and Rl, it also holds for m 
and Rl. 



Proof. For property 1., note that by the component- wise construction of to, 

V6gS, VpgA(I), m(p, 5) C to(p, 6) ; 
Lemma l6. 11 the linear extension of m, and the definition of to then show that 

VctgJ", VpGA(I), to(p, g) = $b(q-) to(p, b) C m{p, $(o-)) = m{p, a) . 

As for property 2., it suffices to work component-wise. Note that (by Lemma [6.11 again) the stated 
assumption exactly means that J2beB^b{o')m{p,b) C Rl. In particular, rewriting the non-positivity 
constraint for each of the d components of the payoff vectors, we get 

^$b(f7) TOfe(p,6) C R^ , 
beB 

for all k G {1, . . . , d}; thus, in particular, J^beB ^b{<^) fh{p, b) = m{p, cr) C Kl. □ 
We can then extend the result of the previous section without the bi-piecewise linearity assumption. 

Theorem 7.1 // Condition (|APMp is satisfied for rh and Rl, then there exists a strategy for {r,H)- 
approaching Rl at a rate of the order ofT^^^^, with a constant per-round complexity. 

Proof. The assumption of the theorem and Property 2. of Lemma mi implv that Condition (jAPM[) 
holds for R1 and rh; furthermore, the latter corresponds to a bi-piecewise linear game as can be seen 
by noting, similarly to what was done in the section devoted to regret minimization (Section 16. 2p . that 
each fhk, being based on the scalar payoff function r^, is a piecewise linear function. Thus, to is also a 
piecewise linear function. 

Therefore, the steps between Equations ((TU)) - (IT^ of the proof of Theorem 16.11 (or the corresponding 
statements in the proof of Theorem 16. 3p can be adapted by replacing m and m by, respectively, m, to, 
and its extension corresponding to Definition 16.11 The result follows. □ 
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7.2 Approachability of polytopes in the case of general games. If that the target set C is a 
polytope, then C can be written as the intersection of a finite number of half- planes, i.e., there exists a 
finite family {(efe, /i,) e M'* x M, k e K.} such that 

C^izeR": (z,efe) sC/fc, Vfce/C}. 

Given the original (not necessarily bi-piecewise linear) game {r,H), we introduce another game (rc,H), 
whose payoff function rc : I x J' ^ MJ^ is defined as 



Vie I, yjej, rcii.J)^ {riij),ek) - fk 
The following lemma follows by rewriting the above. 



keic 



Lemma 7.2 Given a polytope C, the {r, H) -approachability of C and the (rc,H^ -approachability o/ Ml 
are equivalent in the sense that every strategy for one problem translates to a strategy for the other 
problem. In addition, Condition (jAPMp holds for (r, H) and C if and only if it holds for (rc, and Ml . 

Via the lemma above. Theorem 17.11 indicates that Condition (|APMp for (r, H) and C is a sufficient 
condition for the (r, iJ)-approachability of C and provides a strategy to do so. (The per-round complexity 
of this strategy depends in particular at least linearly on the cardinality of /C.) 

7.3 Approachability of general convex sets in the case of general games. A general closed 
convex set can always be approximated arbitrarily well by a polytope (where the number of vertices 
of the latter however increases as the quality of the approximation does). Therefore, via playing in 
regimes, Condition (IAPM|) is also seen to be sufficient to (r, if)-approach any general closed convex 
set C. However, the computational complexity of the resulting strategy is much larger: the per-round 
complexity increases over time (as the numbers of vertices of the approximating polytopes do) . 

Appendix A. An auxiliary result of calibration. 

We prove here ^ for a given 77 > and do so by following the methodology of Mannor and Stolt3 loj . 
(Note that this result is of independent interest.) 

We actually assume that the covering y^, . . . , y^'' is slightly finer than what was required around (|1]) 
and that it forms an Ty/TVg-grid of A(B), i.e., that for all y € A(B), there exists £ € {1, . . . , Nj^} such 
that \\y -y^W^ s$ v/^B- 

We recall that elements y E B are denoted by y = {yb)beB and we identify A{B) with a subset of R^''. 
In particular. If,, the Dirac mass on a given b € B, is a binary vector whose only non-null component is 
the one indexed by b. Finally, we denote by 

0=(0,...,0) and 1 = (!,...,!) 

the elements of respectively formed by zeros and ones only. 

We consider a vector- valued payoff function C : {1, . . . ,-/Vj,} x B ^ ]|j2Af^7VB (^gfj^g^j g^g follows; for all 
£ e {!,..., iV^} and for all b E B, 

C{£,b) ^ (^0, . . . , 0, y^-If,--^l, 0, ...,0 

which is a vector of 2Nri elements of R^ composed by 2{Nri — 1) occurrences of the zero element G R'^ 
and two non-zero elements, located in the positions indexed by 2£ — 1 and 2i. 

We now show that the closed convex set (M.^)'^^^-'^'^ is C-approachable; to do so, we resort to the 
characterization stated in Theorem l2.1l To each y e A(i3) we will associate a pure action £y in {1, . . . , Nf^} 
so that C(^£y,y) G (R-)^^''^"; note that to satisfy the necessary and sufficient condition, it is not 
necessary here to resort to mixed actions of the first player. The index £y is any index £ such that 
II y ~ II 1 ^ vf-^B', such an index always exists as noted at the beginning of this proof. Indeed, one 
then has in particular that for each component b E B, 

Ivt" -yb\ ^ II/" - y||i v/Nb- 
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A straightforward adaptation of the proof of Theorem 12.21 then yields a strategy such that for ah 
d e (0, 1) and for all strategies of the second player, with probability at least 1 — 6, 



sup inf 



(16) 



where M is a bound in Euclidian norm over C, e.g., Af = 4 + 2r/. The quantities of interest can be 
rewritten as 



where we recall that we denoted for all £ such that Nr {£) > the average of their corresponding mixed 
actions by 



The projection in £^-norm of quantity of interest onto (R_)'^^i^s is formed by its non-positive compo- 
nents, so that its square distance to (R-)^^"^'' equals 



inf 

ce(R_)^™'("a 



t=i 



N. 



e=i 



--{\yi,,-yt\-v/NBy 



Therefore, our target is achieved; using first that ( • )+ is subadditive, then applying the Cauchy-Schwarz 
inequality, 



N. 



y -VrWi-v 



< 



yb-Vr.b 



V 

Nb 



\ e=i ^ ^ 



beB 



Nb), 



< 2My]VVB 



ST ' 



where the last inequality holds, by (ITCl) . for all r ^ T with probability at least 1 — S. Choosing an integer 
Ts sufficiently large so that 

concludes the proof of the property stated in (jj]). 



Appendix B. Proof of Lemma 

Proof. For all G I x J', the quantity H{i,j) is a probability distribution over the set of signals 
"H; we denote by Hs{i,j) the probability mass that it puts on some signal s £%. 



Equation (O indicates that for each pair (i, s) e I x 



E 

t=(n-l)L + l 



Hs{l,Jt) 



is a sum of L elements of a martingale difference sequence, with respect to the filtration whose i-th 
element is generated by p„, the pairs (/«, Ss) for s ^ t, and Js for s ^ t + 1. The conditional variances 
of the increments are bounded by 
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since by definition of the strategy, = {1 — 7) -\- we have that pi_n ^ ^/Nx-, which shows that 
the sum of the conditional variances is bounded by 



nL 



t={n-l)L+l 



Pit.n 



LNi 



The Bernstein- Freedman inequality (see Freedman 11 1 or Cesa-Bianchi et al. 1], Lemma A.l) therefore 
indicates that with probabihty at least 1 — 6, 



t=(n-l)L+l 



Plt,n 



t={n-l)L+l 



Ni 2 liVx 2 



Therefore, by summing the above inequalities over z g I and s G Ji, we get (after a union bound) that 
with probability at least 1 — NxN-^S, 



5„ - H{q„) < VNiNn 



7L 3 7L 



Finally, since tT„ is the projection in the £^-norm of cr„ onto the convex set J-, to which H{qj^ belongs, 
we have that 



and this concludes the proof. 



□ 



References 

[1] J. Abernethy, P. L. Bartlett, and E. Hazan. Blackwell approachability and low-regret learning are 
equivalent. In Proceedings of the Twenty- Fourth Annual Conference on Learning Theory (COLT'll). 
Omnipress, 2011. 

[2] D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics, 
6:1-8, 1956. 

[3] D. Blackwell. Controlled random walks. In Proceedings of the International Congress of Mathemati- 
cians, 1954, Amsterdam, vol. Ill, pages 336-338, 1956. 

[4] A. Blum and Y. Mansour. From external to internal regret. Journal of Machine Learning Research, 
8:1307-1324, 2007. 

[5] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 
2006. 

[6] N. Cesa-Bianchi, G. Lugosi, and G. Stoltz. Regret minimization under partial monitoring. Mathe- 
matics of Operations Research, 31:562-580, 2006. 

[7] X. Chen and H. White. Laws of large numbers for Hilbert space- valued mixingales with applications. 
Econometric Theory, 12:284-304, 1996. 

[8] A. P. Dawid. The well-calibrated Bayesian. Journal of the American Statistical Association, 77: 
605-613, 1982. 

[9] D. Foster and R. Vohra. Asymptotic calibration. Biometrika, 85:379-390, 1998. 

[10] D. Foster and R. Vohra. Regret in the on-line decision problem. Cames and Economic Behavior, 
29:7-36, 1999. 

[11] D.A. Freedman. On tail probabilities for martingales. Annals of Probability, 3:100-118, 1975. 

[12] J.E. Goodman and J. O'Rourke, editors. Handbook of Discrete and Computational Geometry. Dis- 
crete Mathematics and its Applications. Chapman & Hall/CRC, Boca Raton, FL, second edition, 
2004. 



Mannor, Perchet, and Stoltz: Robust approachability and regret minimization in games with partial monitoring 
Mathematics of Operations Research xx(x), pp. xxx— xxx, Cc)200x INFORMS 



31 



[13] S. Hart and A. Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econo- 
metrica, 68:1127-1150, 2000. 

[14] S. Hart and A. Mas-Colell. A general class of adaptive strategies. Journal of Economic Theory, 98: 
26-54, 2001. 

[15] E. Lehrer and E. Solan. Learning to play partially-specified equilibrium. Mimeo, 2007. 

[16] G. Lugosi, S. Mannor, and G. Stoltz. Strategies for prediction under imperfect monitoring. Mathe- 
matics of Operations Research, 33:513-528, 2008. An extended abstract was presented at COLT'07. 

[17] S. Mannor and N. Shimkin. On-line learning with imperfect monitoring. In Proceedings of the 
Sixteenth Annual Conference on Learning Theory ( COLT'03), pages 552-567. Springer, 2003. 

[18] S. Mannor and N. Shimkin. Regret minimization in repeated matrix games with variable stage 
duration. Games and Economic Behavior, 63(l):227-258, 2008. 

[19] S. Mannor and G. Stoltz. A geometric proof of calibration. Mathematics of Operations Research, 
35:721 727, 2010. 

[20] S. Mannor, J. Tsitsiklis, and J. Y. Yu. Online learning with sample path constraints. Journal of 
Machine Learning Research, 10(Mar):569-590, 2009. 

[21] J.-F. Mertens, S. Sorin, and S. Zamir. Repeated games. Technical Report no. 9420, 9421, 9422, 
Universite de Louvain-la-Neuve, 1994. 

[22] V. Pcrclict. Calibration and internal no-rcgrct with random signals. In Proceedings of the Twentieth 
International Conference on Algorithmic Learning Theory (ALT'09), pages 68-82, 2009. 

[23] V. Perchet. Approachability of convex sets in games with partial monitoring. Journal of Optimization 

Theory and, Applications, 149:665-677, 2011. 

[24] V. Perchet. Internal regret with partial monitoring calibration-based optimal algorithms. Journal 
of Machine Learning Research, 2011. In press. 

[25] V. Perchet and M. Quincampoix. On an unified framework for approachability in games with or 
without signals. Mimeo, 2011. 

[26] A. Piccolboni and C. Schindelhauer. Discrete prediction games with arbitrary feedback and loss. In 
Proceedings of the Fourteenth Annual Conference on Computational Learning Theory ( COLT'Ol ), 
pages 208-223, 2001. 

[27] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: Beyond regret. In Proceedings of the 
Twenty-Fourth Annual Conference on Learning Theory (COLT'll). Omnipress, 2011. 

[28] J. Rambau and G. Ziegler. Projections of polytopes and the generalized Baues conjecture. Discrete 
and Computational Geometry, 16:215-237, 1996. 

[29] A. Rustichini. Minimizing regret: The general case. Games and Economic Behavior, 29:224-243, 
1999. 



Acknowledgments. Shie Mannor was partially supported by the ISF under contract 890015 and 
the Google Intcr-univcrsity center for Electronic Markets and Auctions. Vianney Perchet benefited from 
the support of the ANR under grant ANR-IO-BLAN 0112. Gilles Stoltz acknowledges support from 
the French National Research Agency (ANR) under grant EXPLO/RA ("Exploration-exploitation for 
efficient resource allocation") and by the PASCAL2 Network of Excellence under EC grant no. 506778. 

An extended abstract of this paper appeared in the Proceedings of the 24th Annual Conference on 
Learning Theory (COLT'll), JMLR Workshop and Conference Proceedings, Volume 19, pages 515-536, 
2011. 



